语音查询项检索中的两阶段得分规整方法<sup>*</sup>

doi:10.16451/j.cnki.issn1003-6059.201603003

摘要
图/表
参考文献
相关文章 (1)

全文: PDF (369 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要得分规整为语音查询项检索系统中的必要过程，文中提出两阶段得分规整方法.先引入rank-p和relative-to-max这2个特征至区分性得分规整方法中，使正确候选结果和错误候选结果的置信度得分区分性更大，更易进行关键词确认.再应用基于优化查询项权重代价指标的得分规整方法得到最优的语音查询项检索性能.实验表明，文中方法同时利用区分性和基于优化查询项权重代价指标得分规整方法的优点，相比最佳单一得分规整方法性能更优.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	李鹏
	屈丹

关键词 ：语音查询项检索, 得分规整, 区分性模型, 置信度得分

Abstract：Score normalization is an essential part for a spoken term detection (STD) system. In this paper, a two-stage score normalization method is proposed. Firstly, two features, rank-p and relative-to-max, are introduced into a discriminative score normalization method to get more discriminative confidence scores between correct and wrong candidate words, and thus the keyword verification is more accurate. Secondly, a term-weighted value evaluation metric based normalization method is applied to further optimize the performance of STD. Experimental results show that the proposed method takes advantages of both discrimination and metric-based score normalization methods, and it obtains better performance than the best single score normalization method does.

Key words： Spoken Term Detection Score Normalization Discriminative Model Confidence Score

收稿日期: 2014-11-18

ZTFLH:

TN 912.34

基金资助:国家自然科学基金项目(No.61403415,61175017)资助

作者简介: 李鹏，男，1989年生，硕士，主要研究方向为语音关键词检测.E-mail:15137172798@163.com.屈丹(通讯作者)，女，1974年生，博士，副教授，主要研究方向语音处理与识别.E-mail:qudanqudan@sina.com.

引用本文:

李鹏，屈丹. 语音查询项检索中的两阶段得分规整方法^*[J]. 模式识别与人工智能, 2016, 29(3): 216-222. LI Peng, QU Dan. Two-Stage Score Normalization Method for Spoken Term Detection. , 2016, 29(3): 216-222.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.201603003 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2016/V29/I3/216

[1] MAMOU J, RAMABHADRAN B, SIOHAN O. Vocabulary Independent Spoken Term Detection // Proc of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Amsterdam, The Netherlands, 2007: 615-622.
[2] CAN D, SARACLAR M. Lattice Indexing for Spoken Term Detection. IEEE Trans on Audio, Speech, and Language Processing, 2011, 19(8): 2338-2347.
[3] VERGYRI D, SHAFRAN I, STOLCKE A, et al. The SRI/OGI 2006 Spoken Term Detection System[C/OL]. [2014-10-20]. http://www.cslu.ogi.edu/~zak/std07.pdf.
[4] MILLER D R H, KLEBER M, KAO C L, et al. Rapid and Accurate Spoken Term Detection // Proc of the 8th Annual Conference of the International Speech Communication Association. Antwerp, Belgium, 2007: 314-317.
[5] WANG Y, METZE F. An In-Depth Comparison of Keyword Specific Thresholding and Sum-to-One Score Normalization // Proc of the 15th Annual Conference of the International Speech Communication Association. Singapore, Singapore, 2014: 2474-2478.
[6] MAMOU J, CUI J, CUI X D, et al. System Combination and Score Normalization for Spoken Term Detection // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2013: 8272-8276.
[7] SOTO V, MANGU L, ROSENBERG A, et al. A Comparison of Multiple Methods for Rescoring Keyword Search Lists for Low Resource Languages // Proc of the 15th Annual Conference of the International Speech Communication Association. Singapore, Singapore, 2014: 2464-2468.
[8] LEE H Y, TU T W, CHEN C Y, et al. Improved Spoken Term Detection Using Support Vector Machines Based on Lattice Context Consistence // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Prague, Czech Republic, 2011: 5648-5651.
[9] SEIGEL M S, WOODLAND P C, GALES M J F. A Confidence-Based Approach for Improving Keyword Hypothesis Scores // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada, 2013: 8565-8569.
[10] TEJEDOR J, ECHEVERRIA A, WANG D. An Evolutionary Confidence Measurement for Spoken Term Detection // Proc of the 9th International Workshop on Content-Based Multimedia Indexing. Madrid, Spain, 2011: 151-156.
[11] Povey D, GHOSHAL A, BOULIANNE G, et al. The Kaldi Speech Recognition Toolkit[C/OL]. [2014-10-20]. http://publica tions.idiap.ch/downloads/papers/2012/Povey_ASRU2011_2011.pdf.
[12] HINTON G, DENG L, YU D, et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 2012, 29(6): 82-97.
[13] TIBREWALA S, HERMANSKY H. Multiband and Adaptation Approaches to Robust Speech Recognition // Proc of the 5th European Conference on Speech Communication and Technology. Rhodes, Greece, 1997: 2619-2622.
[14] KUMAR N. Investigation of Silicon Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition. Ph.D Dissertation. Baltimore, USA: Johns Hopkins University, 1997.
[15] GALES M J F. Semi-tied Covariance Matrices for Hidden Markov Models. IEEE Trans on Speech and Audio Processing, 1999, 7(3): 272-281.
[16] GHOSHAL A, POVEY D, AGARWAL M, et al. A Novel Estimation of Feature-Space MLLR for Full-Covariance Models // Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, USA, 2010: 4310-4313.